Guide - Using AI Telephone Bots

AI Telephone Bots can run as voice agents or advanced IVR flows. They can gather data, call external systems, and perform call-control actions.

Contents

Basic Configuration

AI Bots are configured under console -> Stuff -> Add -> AI Bot.

A valid config must include a root description.

description: >
  You are an AI telephone bot for Widgets Ltd.
  Welcome the caller and collect their name.

initial: Thank you for calling Widgets Ltd, please tell me your name.

initial is optional. If set, it is spoken as the first assistant message for that scope (root/context/step).

Models and Language

If model is omitted (or unknown), the bot defaults to mistral-large-2512.

Currently supported model keys:

  • gpt-3.5-turbo
  • gpt-4o
  • gpt-4o-mini
  • gpt-4.1
  • gpt-4.1-mini
  • gpt-4.1-nano
  • mistral-large-2512
  • mistral-small-2506

temperature is clamped to 0..2.

Optional language should be an ISO 639 code and is used by speech-to-text hints.

Text-to-Speech

Text-to-speech engine and voice can be configured:

tts:
  engine: polly # or mistral
  voice: Amy    # Polly voice ID, or a Mistral voice slug

Supported engines:

  • polly (default) — AWS Polly voices (e.g. Amy, Brian, Emma)
  • mistral — Mistral voices (e.g. gb_oliver_neutral, gb_jane_neutral, en_paul_neutral, fr_marie_neutral)

Speech Input and Guarding

Speech-to-text can be configured with:

stt:
  engine: voxtral # or openai-whisper
  interrupt: true # enable barge-in
  bargeinpower: 10 # power threshold for barge-in detection
  bargeinpoweraveragepackets: 5 # packets to average for power detection

When interrupt is true, speech during prompt playback interrupts the prompt and starts recording immediately (barge-in). bargeinpower and bargeinpoweraveragepackets control the sensitivity of barge-in detection.

For stricter turn-taking, use guard.in at root, context, or step scope (same precedence as other scoped settings: step -> context -> root).

guard.in supports:

  • description: instruction text for the guard classifier (JEXL templates supported)
  • model: optional model override for guard classification (defaults to mistral-small-2506)
  • allow: array of free-text allow rules (JEXL templates supported)
  • correct: boolean to enable speech-to-text correction on caller input
  • action: what to do when input is classified bad

Example:

guard:
  in:
    description: >
      Check caller input against the current prompt.
      Be lenient to natural phrasing.
    model: mistral-small-latest
    allow:
      - Caller answers the prompt
      - Caller asks to speak with a person
    action:
      reprompt: Sorry, I did not catch that. ${{ prompt }}

Speech-to-text correction

correct: true enables automatic correction of speech-to-text errors and regional dialect artifacts. The guard classifier already receives the prompt and the caller's transcript, so it can infer what the caller likely meant.

This is useful when callers have strong regional accents. For example, a caller from Liverpool asked "are you the patient or the carer?" might produce a transcript of "cara" — with correct: true, the guard will correct this to "carer" before it reaches the main AI.

correct can be used on its own (without description or allow) for correction-only, or combined with guard for both validation and correction:

# correction only
guard:
  in:
    correct: true

# guard + correction
guard:
  in:
    description: Check caller input is relevant to the question.
    correct: true
    allow:
      - Caller answers the prompt
    action:
      reprompt: Sorry, I did not catch that. ${{ prompt }}

Like other guard settings, correct follows step -> context -> root precedence. A step can set correct: false to disable correction inherited from a higher scope.

Runtime behavior

  • The spoken prompt is always what is passed to STT.
  • If guard.in is not configured, input is accepted normally.
  • If guard.in is configured, a second AI classification pass is run on each captured user input.
  • When correct: true, the classifier may also return a corrected version of the transcript which replaces the original before it reaches the main AI.
  • Guard classifier failures are fail-closed (treated as bad input).
  • On bad input, guard.in.action runs (reprompt, finish, hangup, or annotate).

Annotate instead of reject

A strict guard can frustrate callers — if the guard rejects unclear input, the caller simply recycles through the same prompt. action.annotate offers a softer alternative: instead of rejecting, the caller's transcript is passed through to the main LLM wrapped in a tag, so the main AI can decide how to handle it (e.g. ask the caller to spell their name).

guard:
  in:
    description: Check caller input is a plausible name.
    action:
      annotate: "[guard: possible STT error - proceed with caution] {{input}}"

annotate accepts:

  • true — use the default template ([guard: possible STT error — proceed with caution] {{input}})
  • a string — a custom template where {{input}} is replaced with the caller's raw transcript

When using the longform, action.tool: annotate can be combined with bad and empty templates:

guard:
  in:
    action:
      tool: annotate
      bad: "[guard: unclear answer] {{input}}"
      empty: "[quiet murmuring, not understood]"

When using annotate, add guidance to the main AI's system prompt telling it how to handle bracketed tags — e.g. "Input wrapped in [...] is metadata from the STT layer; do not read it aloud. If you see [guard: ...], ask the caller to clarify or spell their answer."

Full example: passing the error to the main LLM

Here a patient is asked for their surname. STT often mangles names, so rather than rejecting and making the caller repeat themselves, the guard annotates the transcript and hands it to the main LLM, which already has instructions to ask the caller to spell the name.

description: >
  You are a receptionist taking a patient callback request. Your job is to
  collect the caller's surname and phone number, then confirm.

  When the caller's input arrives wrapped in square brackets (e.g.
  "[guard: ...] smith"), treat the bracketed portion as a private note
  from the speech-to-text layer — DO NOT read it back to the caller.

  If you see "[guard: possible STT error ...]", the transcript is
  uncertain. Ask the caller politely to spell their answer letter by
  letter instead of repeating the question verbatim.

  If you see "[quiet murmuring, not understood]", the caller said nothing
  intelligible. Check they can hear you, then repeat the question once.

contexts:
  collectname:
    description: Ask the caller for their surname.
    guard:
      in:
        description: Check the caller gave a plausible surname.
        allow:
          - Caller said a recognisable name
          - Caller is spelling a name letter by letter
        action:
          tool: annotate
          bad: "[guard: possible STT error - surname unclear] {{input}}"
          empty: "[quiet murmuring, not understood]"
    steps:
      - description: Ask for surname, confirm once you have it, then move on.

Example conversation:

Turn Content
Assistant "What is your surname please?"
Caller (STT) "ffff"
Guard classifies as bad → wraps input
Main LLM sees [guard: possible STT error - surname unclear] ffff
Assistant "Sorry, I didn't quite catch that — could you spell your surname for me, letter by letter?"
Caller (STT) "s m i t h"
Guard accepts (matches "spelling a name" rule)
Assistant "Thank you — so that's Smith, is that right?"

Notice the main LLM picks a more helpful reply than the generic guard reprompt would, because it saw why the guard was suspicious and adapted its strategy.

When to use annotate vs reprompt

  • Use reprompt when the rejection is deterministic and you want tight control of the error wording (e.g. "I can only accept yes or no.").
  • Use annotate when the main LLM has richer context and can recover more gracefully — names, free-form symptoms, addresses, anything where "please spell it" or "please describe it differently" is a better recovery than repeating the same question.
  • Combine them by scope: set a strict reprompt on a yes/no step, and a looser annotate at the context level for open-ended questions.

Tools and Permissions

Built-in tools:

  • send_sms
  • send_sms_caller
  • jump_extension
  • forward_call
  • hangup
  • finish

Example:

tools:
  send_sms:
    destinations:
      - 447700900123
  jump_extension:
    extensions:
      - "1000"
      - "1001"
  hangup: true
  finish: true

Permission resolution order is:

  1. step-level tools
  2. context-level tools
  3. root-level tools

A tool explicitly set to false at a narrower scope is denied even if enabled elsewhere.

hangup and finish can be either:

  • true (AI provides final message)
  • object with final (fixed message enforced by config)

Example fixed final:

tools:
  hangup:
    final: Thank you for calling. Goodbye.

Variables and Session Values

Template format is ${{ ... }}.

Runtime vars available under var:

  • var.now
  • var.uuid
  • var.callerid

Date/time helper functions are available in JEXL compute/template expressions:

  • now()
  • now("YYYY-MM-DD HH:mm:ss")
  • now("YYYY-MM-DD HH:mm:ss", "UTC")
  • formatdatetime(input, "YYYY-MM-DD")
  • formatdatetime() (defaults to current date/time)
  • yearssince(input)

Useful formatdatetime tokens:

  • YYYY, YY
  • MM, M
  • DD, D
  • HH, H
  • mm, m
  • ss, s
  • MMM, MMMM
  • ddd, dddd

Examples:

session:
  today_utc:
    compute: now("YYYY-MM-DD", "UTC")
  timestamp:
    compute: formatdatetime()
  patient_dob_display:
    compute: formatdatetime(session.dob, "ddd, DD MMM YYYY")

Session values are under session.

You can pre-populate session values in config, including templates and compute expressions:

session:
  enquirer_telephone: ${{var.callerid}}
  callerid_len:
    compute: var.callerid.length

Within logic, last contains the most recent webhook result (e.g. last.success).

Contexts, Steps, and Actions

Use start to set the first context.

start: intake

contexts:
  intake:
    description: Collect caller details.

Context Switching

Context switching is controlled by allowed context lists:

  • contexts.<name>.contexts
  • steps[].contexts

When switching, session values defined in collect are saved.

Steps

Steps are ordered and run one at a time.

contexts:
  intake:
    description: Ask one question at a time.
    steps:
      - initial: What is your first and last name?
        collect:
          first_name:
            description: Caller first name
          last_name:
            description: Caller last name
      - description: What is your date of birth?
        collect:
          dob:
            description: Date of birth in YYYY-MM-DD

Important:

  • A string step (e.g. - "ask name") is treated as a step description.
  • It is not converted to initial.

when is supported on contexts and steps.

goto is supported on steps for context jumps:

- goto: another_context

Entry Actions

Contexts and steps can define action blocks that run immediately on entry (before normal AI turn):

action:
  webhook: submit_case

or

action:
  tool: hangup
  final: We have what we need. Goodbye.

Supported action targets:

  • tool: hangup
  • tool: finish
  • webhook: <name>

Webhooks

Webhooks are function tools the AI can call.

Required webhook keys:

  • description
  • url
  • fields

Example:

webhooks:
  submit_case:
    description: Send case to CRM
    url: https://example.com/api/cases
    method: POST
    content_type: application/json

    expect:
      status: 200
      content_type: application/json

    headers:
      Authorization: Bearer ${{secret.crm_token}}

    fields:
      callerid:
        type: string
        value: ${{var.callerid}}
      dob:
        type: string
        description: Date of birth in YYYY-MM-DD
      age:
        compute: yearssince(session.dob)

Notes:

  • Default method is POST.
  • Default request content_type is application/json.
  • Supported request bodies: JSON and application/x-www-form-urlencoded.
  • For JSON payloads, path can map nested objects.
  • required: false omits missing/empty values.
  • expect.status must match exactly when provided.
  • Without expect, success defaults to HTTP 200 or 202.

Webhook Scope by Context/Step

Webhook exposure can be restricted by:

  • context-level webhooks
  • step-level webhooks (array or object allow/deny)

This lets you grant webhook access only where needed.

Mailto Webhooks

url: mailto:someone@example.com is supported.

For mailto webhooks, you can define subject and body as template strings or compute objects.

webhooks:
  submitprescription:
    description: Email prescription request
    url: mailto:ops@example.com
    subject: New prescription request
    body: |
      Caller: ${{ session.first_name }} ${{ session.last_name }}
      Telephone: ${{ session.enquirer_telephone }}
      Recording: https://www.babblevoice.com/a/callexplorer?u=${{ var.uuid }}
      Transcript:
      ${{ fields.data }}
    fields:
      data:
        compute: >
          messages|chattext({ caller: session.first_name, assistant: "Bot" })

RAG Search Tool

You can expose a search tool that queries MiniRAG.

tools:
  search:
    - url: kb://prescriptions
      purpose: NHS prescription policy documents

The AI then receives a search(url, query) function constrained to configured URLs.

Execution Notes

  • Root system prompt is always based on root description.
  • Active context description is appended when in a context.
  • Active step description is appended when in steps.
  • Initial prompt precedence is:
  • current step initial
  • current context initial
  • root initial (only when not in a context)
  • On context switch or step completion, message history is sliced to the new base index to keep prompts focused.
  • AI tool loops and entry-action loops are capped to avoid runaway behavior.
  • Guard input retries are limited per prompt to avoid infinite reprompt loops.

Practical Recommendation

Keep permissions tight:

  • expose only required tools
  • expose only required webhooks per context/step
  • prefer deterministic action + when for critical workflow transitions